作者:Idan Glassberg Tom Hope
这份简短的技术报告演示了一种简单的技术,该技术在医学图像文本匹配任务中产生了最先进的结果。我们分析了OpenAI的CLIP(一种通用的图像-文本匹配模型)的使用,并观察到CLIP有限的文本输入大小对医疗领域的下游性能产生了负面影响,因为医疗领域通常需要编码更长的文本上下文。因此,我们训练并发布了ClipMD,它是用一种简单的滑动窗口技术训练的,用于编码文本字幕。ClipMD在两个医学图像文本数据集上进行了测试,并与其他图像文本匹配模型进行了比较。结果表明,ClipMD在两个数据集上都以很大的优势优于其他模型。我们公开我们的代码和预先训练的模型。
This short technical report demonstrates a simple technique that yields state of the art results in medical image-text matching tasks. We analyze the use of OpenAI’s CLIP, a general image-text matching model, and observe that CLIP’s limited textual input size has negative impact on downstream performance in the medical domain where encoding longer textual contexts is often required. We thus train and release ClipMD, which is trained with a simple sliding window technique to encode textual captions. ClipMD was tested on two medical image-text datasets and compared with other image-text matching models. The results show that ClipMD outperforms other models on both datasets by a large margin. We make our code and pretrained model publicly available.
论文链接:http://arxiv.org/pdf/2303.13340v1
更多计算机论文:http://cspaper.cn/